Perceptron Training for a Wide-Coverage Lexicalized-Grammar Parser
نویسندگان
چکیده
This paper investigates perceptron training for a wide-coverage CCG parser and compares the perceptron with a log-linear model. The CCG parser uses a phrase-structure parsing model and dynamic programming in the form of the Viterbi algorithm to find the highest scoring derivation. The difficulty in using the perceptron for a phrase-structure parsing model is the need for an efficient decoder. We exploit the lexicalized nature of CCG by using a finite-state supertagger to do much of the parsing work, resulting in a highly efficient decoder. The perceptron performs as well as the log-linear model; it trains in a few hours on a single machine; and it requires only a few hundred MB of RAM for practical training compared to 20 GB for the log-linear model. We also investigate the order in which the training examples are presented to the online perceptron learner, and find that order does not significantly affect the results.
منابع مشابه
Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models
This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are “full” parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in t...
متن کاملITRI-98-02 A structure-sharing parser for lexicalized grammars
In wide-coverage lexicalized grammars many of the elementary structures have substructures in common. This means that in conventional parsing algorithms some of the computation associated with different structures is duplicated. In this paper we describe a precompilation technique for such grammars which allows some of this computation to be shared. In our approach the elementary structures of ...
متن کاملAdapting a Lexicalized-Grammar Parser to Contrasting Domains
Most state-of-the-art wide-coverage parsers are trained on newspaper text and suffer a loss of accuracy in other domains, making parser adaptation a pressing issue. In this paper we demonstrate that a CCG parser can be adapted to two new domains, biomedical text and questions for a QA system, by using manually-annotated training data at the POS and lexical category levels only. This approach ac...
متن کاملStatus of the XTAG
Technical Report TALANA-RT-94-01, TALANA, Universite' Paris 7, 1994. Status of the XTAG System C. Doran, D. Egedi, B. A. Hockey, B. Srinivas Institute for Research in Cognitive Science University of Pennsylvania Philadelphia, PA 19104-6228, USA fcdoran, egedi, beth, [email protected] Abstract XTAG is an ongoing project to develop a wide-coverage grammar for English, based on the Featur...
متن کاملPartial Training for a Lexicalized-Grammar Parser
We propose a solution to the annotation bottleneck for statistical parsing, by exploiting the lexicalized nature of Combinatory Categorial Grammar (CCG). The parsing model uses predicate-argument dependencies for training, which are derived from sequences of CCG lexical categories rather than full derivations. A simple method is used for extracting dependencies from lexical category sequences, ...
متن کامل